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lO  rlfifflhr^  ilmpHfhr  for  use  in  program  manipulation  and  verification.  The  simplifier  finds  a 
normal  form  for  any  expression  over  the  language  insisting  of  individual  variables,  the  usual 
boolean  connectives,  equality,  the  conditional  function  coed  (denoting  if-then-else),  the  numerals, 
the  arithmetic  functions  and  predicates  ♦,  •  and the  LISP  constants,  functions  and  predicates 
nil  car,  cdr,  cons  and  atom,  the  functions  store  and  select  for  storing  into  and  selecting  from 
arrays,  and  uninterpreted  function  symbols.  Individual  variables  range  over  the  union  of  the 
reals,  the  set  of  arrays,  LISP  list  structure  and  the  boolean!  true  and  false. 

The  simplifier  is  complete;  that  is,  it  simplifies  every  valid  formula  to  true.  Thus  it  is  also  a 
decision  procedure  for  the  quantifier-free  theory  of  teals,  arrays  and  list  structure  under  the 
above  functions  and  predicates. 


The  organization  of  the  simplifier  is  based  on  a  methou  for  combining  decision  procedures  for 
several  theories  into  a  single  decision  procedure  for  a  theory  combining  the  original  theories. 
More  precisely,  given  a  set  S  of  functions  and  predicates  over  a  fixed  dcfnain,  a  satisfiability 
program  for  S  is  a  program  which  determines  die  satisfiability  of  conjunctions  of  UferaB  (signed 
atomic  formulas)  whose  predicate  and  function  symbols  are  in  S.  We  give  a  general  procedure 
for  combining  satisfiability  programs  for  sets  $  and  T  into  a  single  satisfiability  program  for  S  u 
T,  given  certain  conditions  on  S  and  T. 
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The  simplifier  described  in  this  paper  is  currently  used  in  the  Stanford  Pascal  Verifier. 
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1.  Introduction 


In  this  section  we  give  some  examples  of  simplifications.  We  also  specify  the  syntax  and  semantics  of 
the  language  accepted  by  our  simplifier.  In  section  2,  we  give  a  precise  definition  of  a  satisfiability 
program  for  a  set  S  of  functions,  predicates,  and  constants.  Essentially,  such  a  program  determines 
the  satisfiability  of  conjunctions  of  literals  (signed  atomic  formulas)  whose  predicate  and  function 
symbols  are  in  $.  The  formal  definition  specifies  the  interpretations  of  the  elements  of  S  in  such  a 
way  that  it  makes  sense  to  "merge”  satisfiability  procedures  for  two  sets  S  and  T  into  one  for  SuT. 
We  give  a  method  for  doing  this,  based  on  Craig’s  interpolation  lemma  ([Craig  1957]).  Section  S 
shows  how  a  satisfiability  procedure  can  be  used  to  implement  a  simplifier  for  general  expressions. 
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1.1  Examples  of  the  Use  of  the  Simplifier 

•  Here  are  some  examples  of  simplifications. 

2+3*5; 

17; 

P  3  -«  P; 

-  P; 

X  -  F(X)  3  F(F(F(X)))  -  X; 
true, 

cons(X,  Y)  -  Z  a  car(Z)  ♦  cdr(Z)  -  X  -  Y  «  0; 
true. 
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XsYaY  +  DsXaS*D*2*D3  V[2  *  X  -  Y] »  V[X  ♦  D£ 
true; 


A  -  store(store(A,  I,  AIJR  J,  Alii)  3  A[I]  -  A[J); 
true 


The  last  formula  states  the  theorem  that  if  the  Ith  and  Jth  elements  of  A  are  swapped,  and  if 
the  resulting  array  equals  the  original  one,  then  the  Ith  and  Jth  elements  are  equal. 


1.2  Tho  Theories  2,  jt  and  € 

All  of  the  theories  which  we  consider  are  formalised  in  classical  first-order  logic  with  equality, 
extended  to  include  the  three-argument  function  cond,  where  contKp,  a,  b)  means  "if  p  then  a  else 
b".  The  logical  symbols  are  -,  a,  v,  -,  s,  cond,  V  and  1  A  theory  is  determined  by  its  non -logical 
symbols  (that  is,  its  constant,  function,  and  predicate  symbols)  and  its  axioms. 
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The  functions,  predicates,  and  constants  to  which  the  simplifier  currently  gives  an 
interpretation  are  those  of  the  theory  of  reals  under  addition,  the  theory  of  list  structure  with  ear, 
cdr,  cons,  atom  and  nil,  and  the  theory  of  arrays  under  storing  (store)  and  selecting  (select).  We 
call  these  theories  2,  jC  and  ^4  respectively. 

Given  a  quantifier-free  expression  F,  the  simplifier  tries  to  find  the  simplest  F’  such  that 
F  -  F*  is  entailed  by  the  axioms  for  2,  J  and  JH.  In  particular,  if  F  is  a  formula  entailed  by  the 
axioms,  the  simplifier  returns  the  boolean  constant  true. 

The  non-logical  symbols  of  2  are  +,  -,  <,  >,  i,  >  and  the  numerals.  Its  axioms  are: 

x  +  0  ■  x 
x  +  -  x  -  0 

(x  +  y)  +  z  -  x  +  (y  +  z) 
x  +  y  »  y  +  x 
x  <  x 

x  s  y  v  y  i  x 
x  syAyj X3X  *y 
x  <yAysi3x<z 
xsyax+zsy+z 
Ox  I 
0  s  I 

The  numerals  2,  3,  . . .  and  <,  >,  and  2  are  defined  in  terms  of  0,  I,  ♦,  -  anc  s  in  the  usual 
way.  We  also  allow  multiplication  by  integer  constants;  for  instance,  2  *  x  abbreviates  x  +  x. 

The  integers,  rationals  and  reals  are  all  models  for  these  axioms.  Any  formula  which  is 
unsatisfiable  over  the  rationals  or  reals  can  be  shown  unsatisfiable  as  a  consequence  of  these  axioms. 
Thus  our  simplifier  is  complete  for  the  rationals  or  reals.  It  is  not  complete  if  the  variables  range 
over  the  integers,  since  there  are  unsatisfiable  formulas,  such  as  x  +  x  -  5,  which  cannot  be  shown 
unsatisfiable  as  a  consequence  of  the  above  axioms.  In  this  respect,  our  simplifier  does  not  differ 
from  most  theorem  provers.  The  reason  for  the  incompleteness  is  that  determining  the 
unsatisfiability  of  a  conjunction  of  integer  linear  inequalities  —  the  integer  linear  programming 
problem  —  is  much  harder  in  practice  than  determining  the  satisfiability  of  a  conjunction  of 
rational  linear  inequalities.  This  incompleteness  is  not  as  bad  as  it  seems,  since  most  formulas  that 
arise  in  program  verification  and  program  manipulation  do  not  depend  on  subtle  properties  of  the 
integers.  Further,  there  are  some  easily-implemented  heuristics  (such  as  converting  x  <  y  into 
x  +  I  s  y)  for  integer  variables  which  work  well  in  practice. 

The  theory  of  arrays,  > 4 ,  has  the  non-logical  symbols  store  and  select,  and  the  axioms: 

select(store(v,  i,  e),  j)  -  cond(i  •  J,  e,  select(v,  j)) 
store(v,  i,  seiect(v,  i))  -  v 
store(store(v,  i,  e),  i,  f)  -  store(v,  i,  f) 
i  x  j  s  store(store(v,  i,  e),  j,  f)  -  store(store(v,  j,  0.  i,  e) 


.  A  . 


select(v,  i)  is  the  ith  component  of  the  one-dimensional  array  v.  We  may  write  v[i]  for 
seieet(v,  i).  store(v,  i,  e)  is  the  vector  whose  ith  component  is  e  and  whose  jth  component,  for  j  ■*  i, 
is  the  jth  component  of  v.  Thus,  if  the  program  variable  A  has  the  value  AQ  before  the  assignment 
A[i]  «-  e.  then  afterwards  A  will  have  the  value  itore(A0,  i,  e).  A  two-dimensional  array  can  be 
treated  as  a  vector  of  vectors,  so  A[i,  j]  is  shorthand  for  Atiljl  Had  the  assignment  above  been 
A[i,  j]  «-  e,  the  value  of  A  after  the  assignment  would  be  store(A0,  i,  store(A0[tJ,  J,  e)). 

The  last  three  axioms  are  only  needed  if  equalities  between  array  terms  are  allowed. 

The  theory  of  list  structure,  jC,  has  the  non-logical  symbols  ear,  cdr,  eons,  atom  and  nil,  and 
the  axioms: 

ear(cons(x,  y»  -  x 

cdr(cons(x,  y))  -  y 

-  atom(x)  s  cons(car(x),  edr(x))  -  x 

->  atom(cons(x,  y)) 

atom(nil) 

Notice  that  acyclicity  is  not  assumed;  for  instance,  ear(x)  -  x  is  regarded  as  satisfiable. 

Finally,  it  is  technically  convenient  to  define  the  theory  C  whose  non-logical  symbols  are  all 
uninterpreted  function,  constant,  and  predicate  symbols  and  which  has  no  axioms.  The  theorems  of 
€  follow  from  the  properties  of  equality-,  hence  its  name. 

2.  Merging  Satisfiability  Programs 

In  this  section,  we  define  satisfiability  program.  We  then  show  how  to  "merge"  satisfiability 
programs  for  two  theories  which  have  no  common  non-logical  symbols. 

2.1  Satisfiability  Programs 

If  S  is  a  theory,  then  a  term  is  an  S-ttrn  if  each  non-logical  symbol  occurring  in  the  term  is  a 
non-logical  symbol  of  S.  We  define  S-llteral  and  S-fomula  analogously.  For  example,  x  -  y  and 
x  i  y  +  3  are  ^-literals  but  x  s  car(y)  is  not  Notice  that  a  term  is  an  £-term  if  it  contains  only 
uninterpreted  function  symbols. 

If  S  Is  a  theory,  a  satisfiability  program  for  S  is  a  program  which  determines  whether  a 
conjunction  Lj  a  ...  a  of  ^-literals  is  satisfiable  in  S.  A  satisfiability  program  Is  therefore  a 
decision  procedure  for  satisfiability  for  conjunctions  of  literals. 

We  use  the  name  of  a  theory  to  denote  both  its  satisfiability  program  and  the  conjunction  of 
its  axioms;  for  example,  we  may  say  that  Z  a  0  >  I  is  unsatisfiable,  or  that  the  size  of  Z  Is  3.5K. 


There  are  efficient  satisfiability  programs  for  2,  £  and  JH.  For  2,  the  simplex  algorithm  is 
very  fast  in  practice  ([Nelson  1978}).  [Nelson  and  Oppen  1978]  describe  satisfiability  programs  for 
jC  and  £  which  determine  the  satisfiability  of  conjunctions  of  length  n  in  time  0(n2).  [Johnson  and 
Tarjan  1977]  have  improved  the  underlying  algorithm  to  0(n  log2  n).  [Oppen  1978]  describes  a 
satisfiability  program  for  jF  which  runs  in  linear  time  if  list  structure  is  assumed  to  be  acyclic.  The 
satisfiability  problem  for  conjunctions  of  ^-literals  is  NP-complete  ([Downey  and  Sethi  1976]). 


2.2  Example  of  the  Joint  Satisfiability  Procedure 

We  illustrate  how  2,  Jt  and  £  together  detect  the  unsatisfiability  of  the  following  conjunction  F: 

XsYaYsX  +  car(cons(0,  X))  a  P(F(X)-F(Y))  a  -  P(0) 

We  call  a  formula  homogeneous  if  all  its  n on-logical  symbols  are  from  the  same  theory.  The 
first  step  we  take  is  to  make  each  atomic  formula  homogeneous,  by  introducing  new  variables  to 
replace  terms  of  the  wrong  "type"  and  adding  equalities  defining  these  new  variables.  For  instance, 
the  second  conjunct  would  be  a  ^ -literal  except  that  it  contains  the  term  car(cons(0,  X)),  which  is 
not  a  2- term.  We  therefore  replace  car(cons(0,  X))  by  a  new  variable,  say  Gl,  and  add  to  the 
conjunction  the  equality  Gl  -  car(cons(0,  X))  defining  G I.  By  continuing  in  this  fashion  we 
eventually  obtain  a  formula  F’  which  is  satisfiable  if  and  only  if  F  is,  with  each  literal  of  F’ 
homogeneous.  In  our  example,  F'  is 

XsYaYsX  +  GIa  P(G2)  a  -  P(G5) 

a  Gl  -  car(cons(G5,  X))  a  G2  -  G5  -  G4 
a  G3  -  F(X)  a  G4  -  F(Y)  a  G5  -  0 

We  next  divide  F’  up  into  three  conjunctions  Fg,  and  F^.  F£  contains  all  the  £ -literals,  F^ 
all  the  ^-literals  and  FL  all  the  •/'-literals.  Here  is  F’  divided  up  into  homogeneous  parts: 


XsY  P(G2)  -  true  Gl  -  car(eons(G5,  X)) 

YsX  +  GI  P(G5).  false 

G2  -  GS  -  G4  a  -  F(X) 

G5  -  0  G4  •  F(Y) 

These  three  conjunctions  are  given  to  the  three  satisfiability  programs  2,  £,  and  jC.  Since 
each  conjunction  is  satisfiable  by  itself,  there  must  be  interaction  between  the  programs  for  the 
unsatisfiability  to  be  detected.  The  interaction  takes  a  particular,  restricted  form.  We  require  that 
each  satisfiability  program  deduce  and  propagate  to  the  other  satisfiability  programs  all  equalities 
between  variables  entailed  by  the  conjunction  it  is  considering.  For  example,  if  X  s  Y  and  YsX 
are  asserted  to  2,  it  must  deduce  and  propagate  to  the  other  satisfiability  programs  the  fact  that 
X  -  Y.  The  other  satisfiability  programs  add  X  •  Y  to  their  conjunctions  and  the  process  continues. 


In  our  example,  neither  Fz  nor  F£  entail  any  equalities  between  variables,  but  F^  entails 
G1  -  G5.  jC  propagates  this  equality.  2  uses  this  equality  to  deduce  and  propagate  X  -  Y.  £  then 
propagates  G3  -  G4.  2  then  propagates  G2  -  G5.  Now  £  has  an  inconsistent  conjunction,  and 
signals  unsatisfiable.  The  following  shows  the  literals  received  by  the  satisfiability  programs,  and 
the  propagated  equalities,  listed  in  the  order  in  which  they  were  propagated. 

2  £  JH 

P(G2)  -  true  G I  -  car(cons(G5,  X)) 

P(G5)  -  false 
GS  -  F(X) 

G4  -  F(Y) 

Gl  -  GS 

G3-G4 
unsatisfiable 

If  one  of  the  conjunctions  F^  Fg,  and  FL  becomes  unsatisfiable  as  a  result  of  these 
propagations,  the  original  conjunction  must  be  unsatisfiable.  For  2,  £,  and  jC,  the  converse  holds 
as  well;  that  is,  if  the  original  conjunction  is  unsatisfiable,  then  one  of  the  conjunctions  Fz>  F£,  and 
FL  will  become  unsatisfiable  as  a  result  of  propagations  of  equalities  between  variables.  For  some 
other  theories,  such  as  jt,  the  converse  does  not  hold.  For  these  theories,  a  final  "case-splitting”  step, 
described  in  the  next  section,  is  required. 

It  is  important  to  realize  that  it  is  never  necessary  to  propagate  disequalities,  nor  equalities 
other  than  those  between  variables.  For  instance,  after  receiving  G I  »  G5,  there  was  no  need  for  2 
to  propagate  that  YsXor  that  X  -  Y  ♦  GS,  even  though  these  were  deducible  facts.  None  of  the 
other  satisfiability  programs  could  make  use  of  this  information  —  none  of  them  knows  anything 
about  s  or  ♦ .  Further,  no  disequality  need  be  propagated,  even  though  every  theory  shares  -  and  -  . 
A  disequality  x  0  y  is  needed  to  prove  inconsistency  only  if  x  -  y  is  deduced.  If  some  program 
deduces  x  •  y,  it  will  propagate  this  fact  to  the  other  programs,  and  the  one  that  has  deduced  x  «•  y 
will  detect  the  inconsistency. 

Notice  that  the  only  satisfiability  programs  that  can  make  use  of  a  propagated  equality 
between  two  variables  are  those  whose  conjunctions  contain  occurrences  of  both  variables.  For 
instance,  when  jC  propagated  Gl  -  G5,  only  2  ever  made  use  of  this  equality.  When  equalities  are 
propagated,  the  only  satisfiability  programs  that  need  to  receive  the  equality  are  those  which  already 
"know"  about  both  variables  in  the  equality. 


XsY 
YsX  +  Gl 
G2  -  G3  -  G4 
G5-  0 


G2-  G5 


2.3  Joint  Satisfiability  Procedure 

In  this  section  we  present  the  joint  satisfiability  proctdurt  illustrated  in  the  previous  section.  We 
assume  that  we  have  two  theories  S  and  3  with  no  common  n on-logic»l  symbols.  The  case  for 
more  than  two  theories  follows  easily. 

Given  a  conjunction  F  of  literals  whose  n on-logical  symbols  are  among  those  of  S  and  3,  the 
joint  satisfiability  procedure  determines  whether  F  is  satisfiable  in  the  theory  axiomatized  by 
S  a  ^7.  Fg  and  Fy  are  program  variables  containing  conjunctions  of  literals. 

1.  [Make  F  homogeneous.]  Assign  conjunctions  to  Fg  and  Fy  by  the  method  described  in 
section  2.2  so  that  Fg  contains  a  conjunction  of  ^-literals,  Fy  a  conjunction  of  ^-literals,  and 
Fg  a  Fy  is  satisfiable  if  and  only  if  F  is. 

2.  [Unsatisfiable?]  If  either  Fg  or  Fy  are  unsatisfiable,  return  unsatisfiable. 

9.  [Propagate  equalities.]  If  either  Fg  or  Fy  entail  some  equality  between  variables  not 
entailed  by  the  other,  then  add  the  equality  as  a  new  conjunct  to  the  one  that  does  not  entail 
it  Go  to  step  2. 

4.  [Case  split  necessary?]  If  either  Fg  or  Fy  entail  a  disjunction  Uj  -  Vj  v  . . .  v  uf  -  of 
equalities  between  variables,  without  entailing  any  of  the  equalities  alone,  then  apply  the 

procedure  recursively  to  the  k  formulas  Fg  a  Fy  a  uj  -  vf . Fg  a  Fy  a  ufc  -  vfc.  If  any  of 

these  formulas  are  satisfiable,  return  satisHabie.  Otherwise  return  unsatisfiable. 

5.  Return  satisfiable. 

If  the  procedure  returns  unsatisfiable,  it  is  dear  that  F  is  unsatisfiable.  We  will  prove  in  the 
next  section  that  the  procedure  is  also  correct  if  it  returns  satisfiable.  The  procedure  always  halts, 
since  each  repetition  of  step  S  or  recursive  call  in  step  i  conjoins  an  equality  to  one  of  the 
conjunctions  Fg  or  Fy  not  previously  entailed  by  the  conjunction.  This  can  happen  at  most  n  -  1 
times,  where  n  is  the  number  of  variables  appearing  after  step  2,  since  there  can  be  no  more  than 
n  -  I  non-redundant  equalities  between  n  variables. 

We  have  not  implemented  the  joint  satisfiability  procedure.  It  Is  subsumed  by  the 
simplification  algorithm  described  in  section  S. 

[Kaplan  1968]  proves  that  the  quantifier-free  theory  of  arrays  with  constant  indices  is 
decidable.  [Shostak  1978]  proves  that  quantifier-free  Presburger  arithmetic  with  uninterpreted 
function  symbols  is  decidable.  [Suzuki  and  Jefferson  1977]  prove  that  quantifier-free  Presburger 
arithmetic  with  arrays  is  decidable.  The  joint  satisfiability  procedure  provides  practical  decision 
procedures  for  each  of  these  theories. 
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2.4  Convexity  and  Case  Splitting 

In  this  section,  we  characterize  the  theories  which  require  case  splitting. 

A  formula  F  is  non~amoex  if  there  exist  2n  variables  Xj,  y^  ... ,  xn,  yn>  n  st  2,  such  that 
F  »  x  i  -  y  j  v  . . .  v  xn  -  yn  but  for  no  i  between  I  and  n  does  F  o  x(  -  y;.  Otherwise,  F  is  convex. 

A  theory  S  is  convex  if  every  conjunction  of  ^-literals  is  convex.  If  the  satisfiability 
programs  merged  by  the  Joint  satisfiability  procedure  are  satisfiability  programs  for  convex  theories, 
case  splitting  never  occurs.  Case  splitting  may  occur  if  one  or  more  of  the  theories  are  non-convex. 

Z  is  convex,  since  the  solution  set  of  a  conjunction  of  ^-literals  is  the  intersection  of  a 
convex  set  with  a  finite  number  of  complements  of  hyperplanes.  Such  a  set  cannot  be  a  subset  of  a 
union  of  finitely  many  hyperplanes  unless  It  is  a  subset  of  one  of  them. 

€  and  jC  are  convex;  this  follows  from  the  characterization  in  (Nelson  and  Oppen  1978]  of 
the  set  of  equalities  entailed  by  a  conjunction  of  £-  or  -literals. 

A  is  not  convex,  as  shown  by  the  following  example.  Suppose  that  the  theories  merged  by  the 
Joint  satisfiability  procedure  are  A  and  f,  and  that  after  step  I  the  two  formulas  are 

Fa:  store(v,  I,  eXj]  -  x  a  v[j]  -  y 

A  Fy*  x  >  e  a  x  >  y 

Each  formula  is  satisfiable,  the  whole  conjunction  is  unsatisfiable,  but  there  are  no  equalities 
to  propagate  in  step  i.  In  step  i,  A  propagates  the  disjunction  x  ■  e  v  x  -  y;  each  case  leads  to  a 
contradiction  in  f. 

We  plan  to  extend  Z  to  be  complete  for  the  integers.  It  will  then  no  longer  be  convex,  since 
for  example  x-Ia7«2a1szass2  entails  the  disjunction  x  -  z  v  y  •  z  without  entailing 
either  disjunct  alone  However,  since  we  need  only  propagate  equalities  between  variables,  not 
between  variables  and  constants,  literals  such  as  I  s  z  s  100  will  not  cause  splits  (unless  there  are 
100  variables  equal  to  1, 2, ....  100  respectively!). 

The  theory  of  sets,  which  we  intend  to  add  to  the  simplifier,  is  another  example  of  a 
non-convex  theory,  for  example,  x  <  (y,  z)  causes  the  case  split  x  -  y  v  x  -  z. 

Non-convexity  complicates  simplification.  If  a  case  split  occurs  for  which  some,  but  not  all, 
cases  are  satisfiable,  a  good  simplifier  must  determine  which  of  the  cases  are  satisfiable.  To  see  this, 
-  consider  the  problem  of  simplifying  x  «  (4,  -6}  a  x  >  0  to  x  •  4.  This  conjunction  of  literals  is 

satisfiable,  as  the  Joint  satisfiability  procedure  determines  by  doing  the  case  split  x  •  4  v  x  -  -6.  The 
simplifier  must  discover  that  the  satisfiable  branch  of  the  split  is  the  one  in  which  x  -  4. 


■iW?  3-  ■  Afc---  .V- 


r~>  V1'’  •  !*Wl 


cm 


» 


2.5  Correctness  of  the  Joint  Satisfiability  Procedure 

The  proof  of  correctness  requires  several  lemmas.  Our  first  goal  is  to  define  the  residue  of  a 
formula.  Essentially  the  residue  is  the  strongest  boolean  combination  of  equalities  between  variables 
which  the  formula  entails.  For  example  the  residue  of  the  formula  x  -  f(a)  Ay-  f(b)  is 
a  •  b  o  x  -  y,  and  the  residue  ofxsyAysxisx-y. 

We  make  the  following  assumptions  about  the  underlying  formal  system:  (I)  Individual 
variables  are  distinguishable  from  function  variables.  (2)  There  to  no  quantification  over  functions 
or  predicates.  (3)  There  ire  no  propositional  variables.  The  third  restriction  is  not  essential,  but  it 
simplifies  the  statement  of  the  proof. 

A  parameter  of  a  formula  is  any  non-logica)  atomic  symbol  which  occurs  free  in  the  formula. 
Thus  the  parameters  of  a  -  b  v  Vx  P(x,  f(x»  -  c  are  a,  b,  P,  f,  and  e. 

We  define  a  simple  formula  to  be  one  whose  only  parameters  are  individual  variables.  For 
instance,  xxyvz-yandVxxnyare  simple,  but  x  <  y  and  f(x)  -  y  are  not  Thus  an  unquantified 
simple  formula  is  a  propositional  formula  whose  atomic  formulas  are  equalities  between  individual 
variables.  The  next  lemma  characterizes  quantified  simple  formulas. 

Lemma  1:  Every  quantified  simple  formula  F  is  equivalent  to  some  unquantified  simple 
formula  G.  G  can  be  chosen  so  that  its  variables  are  all  free  variables  of  F. 

Proof:  Suppose  F  is  of  the  form  3x  ♦<x).  Let  ♦q  be  the  formula  resulting  from  ♦  by  first 
replacing  any  occurrences  of  x  -  x  and  x  *  x  by  true  and  false  respectively,  and  then  replacing  any 

remaining  equality  involving  x  by  false.  Then,  if  Vj . vR  are  the  parameters  of  ♦,  F  is 

equivalent  to  ♦q  v  ♦(Vj)  v  . . .  v  ♦(vfc),  since,  in  any  interpretation,  x  either  equals  one  of  the  v(  or 
else  differs  from  all  of  them.  By  repeatedly  eliminating  quantifiers  in  this  manner,  we  eventually 
obtain  an  equivalent  quantifier-free  formula  whose  only  variables  are  free  variables  of  F. 

Any  interpretation  ^  for  a  formula  F  determines  an  equivalence  relation  ~  on  the  free 
variables  of  F  by  the  rule  u  ~  v  if  and  only  if  ftu)  -  f(v).  It  follows  from  lemma  1  that  if  F  is 
simple,  —  completely .  aewrwsne*  wnetner  p  sausries  w. 

Lnrnma  Z  (Craig's  interpolation  lemma)  If  F  entails  G,  then  there  exists  a  formula  H  such 
that  F  entails  H  and  H  entails  G,  and  each  parameter  of  H  is  a  parameter  of  both  F  and  G. 

Proof:  see  (Craig,  19571 

Lnrnma  3;  If  F  is  any  formula,  then  there  exists  a  simple  formula  Res(F),  the  residue  of  F, 
which  is  the  strongest  simple  formula  that  F  emails;  that  is,  if  H  is  any  simple  formula  entailed  by  F, 
then  ResCF)  entails  H.  Res(F)  cm  be  written  so  (hat  its  only  variables  are  flee  variables  of  F. 

Proof:  Lot  (Ox)  be  the  set  of  ad  simple  formulas  which  F  entails.  For  each  Gy  choose  Hx  so 
Mat  F  »  »  ©x,  the  only  parameters  of  are  parameters  of  both  F  and  Ox,  and  Hx  is 


unquantified.  The  existence  of  is  guaranteed  by  lemmas  I  and  2.  Now,  each  Hx  is  a 
propositional  formula  whose  atomic  formulas  are  equalities  between  individual  parameters  of  F.  It  is 
easy  to  show  that  an  infinite  conjunction  of  propositional  formulas  over  a  finite  set  of  atomic 
formulas  is  equivalent  to  some  finite  propositional  formula  over  these  atomic  formulas.  Therefore 
the  conjunction  of  the  Hx  is  equivalent  to  some  finite  subconjunction  H.  Any  simple  formula 
entailed  by  F  is  entailed  by  some  Hx>  and  so  by  H.  The  only  parameters  of  H  are  free  individual 
parameters  of  F.  Thus  H  is  the  residue  of  F. 

Here  are  some  examples  of  residues. 

Formula  Residue 

x  -  f(a)  Ay-  f(b)  a  -  b  a  x  -  y 

x  +  y-  a-  b>0  -*(x-aAy-b)A-»(x-bAy-a) 

x  -  store(v,  i,  e)[jj  i  -  j  s  x  -  e 

x  -  store(v,  i,  eXjl  Ay-  vtj]  eond(i  -  j,  x  -  e,  x  -  y). 

Notice  in  the  last  two  formulas  how  the  addition  of  an  individual  variable  as  a  "label"  affects 
the  residue. 

As  a  final  example  to  relate  the  notion  of  residue  to  that  of  joint  satifiability,  here  are  the 
residues  of  the  formulas  which  appeared  in  the  example  of  section  2.2: 

2  C  X 

XsY  P(G2)  Cl  -  car(cons(G5,  X)) 

YsX  +  GI  -  P(G5) 

G2  -  G3  -  G4  G3  -  F(X) 

G5  -  0  G4  -  F(Y) 

G5-GI  -  X-Y  a  G3-G4  ■  G2-G5 

G2  *  G5  a  X  -  Y  3  G3  -  G4 

Gl  -  G5 

As  we  found  in  section  2.2,  the  residues  are  inconsistent  An  essential  fact  needed  for  proving 
the  correctness  of  the  joint  satisfiability  procedure  is  that  these  residues  are  always  inconsistent  if  the 
original  formula  is.  This  fact  is  a  consequence  of  the  following  lemma. 

Lemma  4:  If  A  and  B  are  formulas  whose  only  common  parameters  are  individual  variables, 
then  Res(A  a  B)  ■  Res(A)  a  Res(B). 

Proof:  Obviously  the  left  side  of  the  equivalence  entails  the  right  side,  so  we  need  only  show 
the  converse. 


From  A  a  B  a  Res(A  a  B)  we  get  A  3  (B  a  Res(A  a  B))  and  so,  by  Craig's  interpolation 
lemma,  there  is  a  formula  H  entailed  by  A  which  entails  B  3  Res(A  a  B)  and  whose  only 
parameters  are  parameters  of  A  and  B.  But  these  must  be  individual  variables,  so  H  is  simple  and 
therefore  Res(A)  3  (B  3  Res(A  a  B».  Writing  this  as  B  3  (Res(A)  3  Res(A  a  B»,  and  observing  that 
the  right  hand  side  is  simple,  we  have  Res(B)  s  (Res(A)  3  Res(A  a  B)),  or,  equivalently, 
Res(A)  a  Res(B)  s  Res(A  a  B),  which  proves  the  lemma. 

Lemma  5:  Let  Fj,  F2 . Fn  be  simple,  convex  formulas  and  V  be  the  set  of  all  variables 

appearing  in  any  Fj.  Suppose  that  for  all  x,  y  in  V  and  for  all  i,  j  from  I  to  n,  either  both  F.  and  F. 
entail  x  >  y,  or  neither  do.  Then  Fj  a  a  ...  a  is  satisfiable  if  and  only  if  each  F.  is  satisfiable. 

Proof:  The  "only  if”  part  is  obvious.  To  prove  the  "if”  part,  assume  that  each  F.  is  satisfiable. 
Let  S  be  the  set  of  equalities  between  variables  in  V  entailed  by  some  (hence  all)  of  the  F.  and  T  be 
the  set  of  all  other  equalities  between  variables  of  V.  We  claim  that  any  interpretation  which  makes 
every  equality  in  S  true  and  every  equality  in  T  false  satisfies  each  F|.  If  it  does  not  satisfy  Fj,  then 
Fj  entails  the  disjunction  of  all  equalities  in  T.  Now  we  consider  three  cases.  If  T  is  empty,  Fj  is 
unsatisfiable.  If  T  contains  only  one  equality,  it  is  entailed  by  Fj  and  so  it  Is  in  S.  If  T  contains  more 
than  one  equality,  F.  is  non-convex.  Each  case  contradicts  our  assumptions. 

We  can  now  complete  the  proof  of  correctness  of  the  joint  satisfiability  procedure  by  showing 
that  if  it  returns  satisfiable,  F  is  satisfiable.  To  show  that  F  is  satisfiable,  it  suffices  to  show  that 
Res(^  a  Fg  a  D  a  Fj)  is  not  the  constant  false.  But  by  lemma  4,  this  residue  is  equivalent  to 
Res(^  a  Fg)  a  R ts£J  a  Ft).  If  step  5  of  the  procedure  is  reached,  each  of  these  residues  must  be 
convex,  since  step  4  did  not  cause  a  case  split  Furthermore,  the  residues  entail  the  same  set  of 
equalities  and  are  each  satisfiable,  since  steps  2  and  3  were  passed.  By  lemma  5,  the  conjunction  of 
the  residues  is  satisfiable.  Thus  F  is  satisfiable  if  the  algorithm  returns  from  step  5.  It  follows,  by 
induction  on  the  depth  of  recursion,  that  F  is  satisfiable  whenever  step  4  returns  satisfiable. 

3.  Simplification  Based  on  Satisfiability  Programs 

In  section  3.1  we  describe  cond  normal  form  for  boolean  expressions.  In  section  3.2  we  give  a 
simplification  algorithm  for  formulas  in  cond  normal  form.  In  section  3.3  we  discuss  some  aspects  of 
the  efficiency  of  our  simplification  algorithm,  and  in  section  3.4  discuss  some  of  its  deficiencies. 


3.1  Cond  Normal  Form 


For  convenience  we  use  LISP  list  notation  in  this  section.  That  is,  the  term  f(a,  b)  is  denoted  (f  a  b). 


Our  simplifier  first  puts  expressions  into  cond  normal  form.  This  is  simitar  to  the  cond  normal 
form  in  [McCarthy  1963].  An  expression  is  in  cond  normal  form  if: 
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(1)  The  expression  does  not  contain  any  boolean  connectives  other  than  cond.  Thus  A  a  B  is 
replaced  by  the  equivalent  (cond  A  B  false),  and  -A  by  (cond  A  false  true). 

(2)  No  first  argument  to  a  cond  is  a  cond.  Thus  (cond  (cond  P  A  B)  C  D)  is  replaced  by 
(cond  P  (cond  A  C  D)  (cond  B  C  D)). 

(3)  No  expression  of  the  form  (cond  P  A  B)  is  the  argument  to  any  function  other  than  cond; 
thus  (F  (cond  P  A  B))  is  replaced  by  (cond  P  (F  A)  (F  B)). 

(4)  Every  boolean  subexpression,  other  than  constant  subexpressions  true  and  false,  is  the 
first  argument  to  a  cond.  For  instance,  a  single  atomic  formula  P  which  is  not  the  first  argument  to 
a  cond  is  replaced  by  (cond  P  true  false).  F(X-Y)  is  successively  replaced  by  (F  (cond  (•  X  Y)  true 
false))  and  (cond  (-  X  Y)  (F  true)  (F  false)). 

(In  practice,  the  transformation  required  by  (4)  is  not  carried  out  if  the  subexpression  is  a 
second  or  third  argument  to  cond,  since  this  would  waste  space.  If  A  and  B  are  boolean,  the  cond 
normal  form  of  (cond  P  A  B)  is  (cond  P  (cond  A  true  false)  (cond  B  true  false))  but  we  store  it 
as  (cond  P  A  B).) 

Cond  normal  form  is  not  a  canonical  form,  since  two  syntactically  different  expressions,  each 
in  cond  normal  form,  may  be  logically  equivalent. 

An  expression  in  cond  normal  form  corresponds  naturally  to  a  binary  tree  whose  nodes  are 
labelled  with  atomic  formulas.  We  call  this  tree  the  cond,  tree  for  the  expression.  To  the  expression 
(cond  P  A  B)  corresponds  the  tree  whose  root  is  labelled  with  P,  whose  left  son  is  the  tree  for  the 
expression  A,  and  whose  right  son  is  the  tree  for  the  expression  B.  The  tree  for  any  non-cond 
expression  E  is  a  node  with  no  sons  labelled  with  E.  Thus  every  node  in  a  cond  tree  is  either  an 
internal  node  with  two  sons  and  a  boolean  expression  as  label,  or  a  leaf  node  whose  label  is  either 
non-boolean  or  one  of  the  constants  true  or  false. 

The  maximum  number  of  nodes  in  the  cond  tree  for  an  expression  of  length  n  is  exponential 
in  n.  But,  by  sharing  structure,  the  tree  can  be  represented  as  a  directed  graph;  the  amount  of 
storage  required  is  linear  in  n. 

Let  N  be  a  node  of  the  cond  tree  for  some  expression.  Then  <Nj  N2  . . .  N^>  is  the  branch  to 
N  if  Nj  is  the  root  of  the  tree,  N^  »  N,  and,  for  each  1  s  i  <  k,  Nj+1  is  a  son  of  N..  The  context  at 
N  is  the  conjunction  Lj  a  . . .  a  L^_j,  where  each  L.  is  the  label  of  N.  if  Nj+J  is  the  left  son  of  N., 
and  the  negation  of  the  label  of  N{  otherwise. 

The  context  of  a  node  is  exactly  the  condition  that  must  hold  for  an  evaluator  to  reach  the 
node  during  evaluation  of  the  expression.  That  is,  if  the  conditional  expression  is  regarded  as  a 
program  fragment,  the  context  of  a  node  is  the  strongest  'invariant  assertion”  on  the  arc  leading  to 
the  node.  For  example,  consider  the  following  expression  in  cond  normal  form:  (cond  P  (cond  Q.A 
B)  (cond  R  C  D)).  The  context  of  the  node  for  B,  that  is,  P  a  -Q,  is  the  condition  that  B  would  be 
evaluated  if  the  whole  expression  were  evaluated. 


It  follows  that  the  disjunctive  normal  form  of  a  formula  is  the  disjunction  of  the  contexts  of 
the  leaves  labelled  with  true  in  the  cond  tree  for  the  formula.  Cond  normal  form  is  more  compact 
than  traditional  disjunctive  normal  form  because,  in  cond  normal  form,  disjuncts  are  represented  as 
branches  in  a  tree  (or  paths  in  a  directed  graph)  and  thus  may  share  structure. 

3.2  The  Simplification  Algorithm 

To  simplify  an  expression,  the  simplifier  traverses  its  cond  tree,  maintaining  as  it  does  so  a 
representation  of  the  context  of  the  node  it  is  visiting.  When  a  node  is  reached  with  an  inconsistent 
context,  the  node  and  the  subtree  below  it  are  ignored.  Thus  the  simplifier  "prunes"  away  all 
inconsistent  branches  in  the  tree.  The  simplifier  also  collapses  together  branches  to  leaves  with 
equivalent  labels  by  replacing  expressions  of  the  form  (cond  p  x  x)  by  x.  If  the  expression  is  a  valid 
formula,  every  leaf  which  is  reached  will  be  labelled  true;  all  these  branches  will  be  collapsed,  and 
true  will  be  returned.  Similarly  an  unsatisfiable  formula  simplifies  to  false. 

If  the  context  of  a  node  N  is  non-convex,  the  simplifier  traverses  the  subtree  rooted  at  N  once 
for  each  branch  of  the  case  split. 

SIMPLIFY  takes  two  arguments:  F,  an  expression  in  cond  normal  form,  and  CONTEXT,  a 
conjunction  of  literals.  It  returns  the  simplest  F’  such  that  CONTEXT  a  F  ■  F’.  If  CONTEXT  is 
unsatisfiable.  It  returns  the  atomic  symbol  omega.  We  assume  that  omega  does  not  appear  in  F.  The 
algorithm  uses  the  auxiliary  function  SIMPATOM;  If  T  is  a  term,  SIMPATOM(T,  CONTEXT ) 
returns  the  simplest  term  T*  such  that  CONTEXT  a  T  -  T’. 


SIMPLIFY(F,  CONTEXT): 

1.  If  CONTEXT  is  unsatisfiable,  return  omega. 

2.  If  F  is  not  of  the  form  (cond  P  A  B),  return  $IMPATOM(F,  CONTEXT). 

3.  If  CONTEXT  is  not  convex,  let  Ej  v  . . .  v  Ek  be  a  disjunction  of  equalities  entailed  by 
CONTEXT  no  disjunct  of  which  is  entailed  by  CONTEXT. 

Set  F  «-  (cond  Ej  F  (cond  Eg  F  . . .  (cond  Ek  F  omega) ...)). 

4.  F  is  of  the  form  (cond  P  A  B).  Set  A  «-  SIMPLIFY(A,  P  a  CONTEXT),  B 
SIMPLIFY(B,  -'Pa  CONTEXT).  If  A  ■  omega,  return  B.  If  B  -  omega,  return  A.  If 
A  -  B,  return  A.  Otherwise,  let  P  -  SIMPATOM(P,  CONTEXT).  If  A  -  true  and 
B  -  false,  return  P.  Otherwise  return  the  expression  (cond  P  A  B). 

The  proofs  of  termination,  correctness,  and  completeness  of  this  procedure  are  straightforward. 
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In  the  remainder  of  this  section,  we  describe  our  implementation  of  this  simplification 
algorithm. 

The  implementation  uses  a  function  ASSERT,  which  conjoins  an  arbitrary  literal  to  a  global 
context  representing  a  conjunction  of  literals.  The  value  returned  by  ASSERT  Indicates  either  that 
the  resulting  conjunction  is  convex  and  satisfiable,  or  that  the  conjunction  is  unsatisfiable,  or  that 
the  conjunction  has  become  non -convex.  In  the  last  case,  ASSERT  also  specifies  the  case  split  to  be 
done. 


In  order  to  implement  ASSERT  efficiently,  we  require  that  the  individual  satisfiability 
programs  have  certain  properties.  An  Increments  satisfiability  program  is  one  which  accepts  literals 
one  by  one  and  which  can  determine  at  any  time  whether  their  conjunction  is  satisfiable.  If  in 
addition  it  can  mark  its  state,  accept  more  literals,  and  later  return  to  the  marked  state  by  "undoing" 
the  literals  asserted  after  the  mark,  it  is  called  resettable.  To  be  used  in  our  simplifier,  a  satisfiability 
program  must  be  resettable  and  must  propagate  the  equalities  and  disjunctions  of  equalities  which 
are  entailed  by  the  conjunction  is  has  received. 

More  precisely,  a  satisfiability  program  for  a  theory  S  consists  of  a  global  data  structure, 
CONTEXTg,  for  representing  conjunctions  of  ^-literals,  together  with  the  following  functions  for 
manipulating  it. 


ASSERTg(P)  where  P  is  a  literal,  changes  CONTEXTg  to  represent  Q,a  P,  where  QJs  the 
conjunction  currently  represented  by  CONTEXTg.  If  Q.  a  P  is  unsatisfiable,  ASSERTg(P)  returns 
false.  Otherwise,  if  there  are  any  equalities  between  variables  which  are  entailed  by  Q,a  P  but  not 
by  Q,  then  ASSERTg(P)  returns  the  conjunction  of  all  such  equalities.  Otherwise,  If  Q,  a  P  is 
non -convex,  ASSERTg(P)  returns  a  disjunction  of  equalities  between  variables  entailed  by  Q,  a  P 
no  disjunct  of  which  is  entailed  by  Q,a  P.  Otherwise,  ASSERTg(P)  returns  the  constant  true. 

PUSHgO  saves  the  current  state  of  CONTEXTg. 

POPgO  restores  CONTEXTg  to  the  state  it  was  in  just  before  the  last  call  to  PUSHgO. 

SIMPATOMg(F),  where  F  is  an  .S’- term  or  ^-literal,  returns  an  expression  F’  equivalent  to 
F  in  CONTEXTg.  F’  is  the  normal  form  for  F  in  this  context.  For  example,  SIMPATOMz(x  +  0) 
returns  x  and  SIMPATOMz(x  -  y)  returns  0  If  x  -  y  is  entailed  by  CONTEXTg.  (SIMPATOMg 
will  only  be  called  when  CONTEXTg  is  consistent). 


Now  we  are  ready  to  define  ASSERT.  ASSERT  accepts  an  arbitrary  literal,  splits  it  into 
homogeneous  pieces,  and  calls  the  appropriate  assertion  functions  of  the  individual  satisfibility 
programs.  We  define  it  for  the  case  where  there  are  two  theories  S  and  D.  The  case  where  there 
are  more  than  two  theories  is  analogous. 


In  this  program,  Pg,  PT,  Qg,  and  are  variables  containing  formulas. 
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ASSERT(Q): 


1.  Divide  Q,into  homogeneous  pieces  Qg  and  Qy  as  described  in  section  2. 

2.  Set  Pg  «-  ASSERTg(Qg>,  Py  «-  ASSERT^). 

3.  If  either  Pg  or  PT  are  false,  return  false. 

4.  If  either  Pg  or  Py  are  disjunctions,  return  one  of  these  disjunctions. 

5.  If  both  Pg  and  PT  are  true,  return  true. 

6.  (One  or  both  of  the  formulas  is  a  conjunction  of  equalities.  This  step  propagates  the 

equalities.)  Set  each  of  the  variables  Qg  and  Qy  to  be  the  formula  Pg  a  Pt>  and  go  to  step  2. 

ASSERT  propagates  equalities  between  the  satisfiability  programs  until  one  of  them 
propagates  false,  or  one  of  them  splits  (by  returning  a  disjunction),  or  both  of  them  stabilize. 

Notice  that  a  term  t  in  an  inhomogeneous  literal  which  has  been  replaced  by  a  new  variable  v 
in  step  I  of  some  call  to  ASSERT  may  in  a  subsequent  call  be  replaced  by  another  new  variable  w. 
This  is  all  right,  since  both  t  -  v  and  t  -  w  are  sent  to  the  same  satisfiability  program,  which  will 
propagate  v  -  w. 

It  is  not  necessary  to  send  all  the  equalities  to  all  the  satisfiability  programs  in  step  6.  As 
mentioned  in  section  2.2,  an  equality  need  only  be  sent  to  a  satisfiability  program  if  both  variables 
in  the  equality  are  parameters  of  the  conjunction  represented  in  the  program. 

There  is  one  feature  of  our  the  simplification  algorithm  described  above  which  makes  it 
unsuitable  for  implementation.  If  CONTEXT  is  represented  by  CONTEXTg  and  CONTEXT^ 
then  the  tests  in  steps  I  and  3  require  a  case  split  if  either  CONTEXTg  or  CONTEXTj  is 
n on-convex.  If  two  or  more  of  the  cases  are  satisfiable,  the  simplifier  will  repeat  the  case  split.  A 
better  approach  is  to  return  omega  from  step  I  if  CONTEXTg  or  CONTEXT^  is  unsatisfiable, 
and  to  split  in  step  3  if  CONTEXTg  or  CONTEXTy  is  non-convex.  Using  this  approach,  the  tests 
can  be  made  without  a  case  split,  since  a  case  split  is  not  necessary  to  determine  if  an  individual 
context  is  unsatisfiable  or  non-convex.  ASSERT  avoids  redundant  case  splits  by  returning 
immediately  if  one  of  the  satisfiability  programs  splits,  without  checking  if  any  of  the  branches  of 
the  case  split  is  satisfiable. 

In  addition  to  ASSERT,  the  second  simplification  algorithm  uses  the  functions  PUSH,  POP, 
and  SIMPATOM.  PUSH  and  POP  simply  call  the  push  and  pop  functions .  for  each  of  the 
satisfiability  programs.  SIMPATOM  takes  an  arbitrary  term  or  literal  and  simplifies  it  using  the 
information  in  CONTEXTg  and  CONTEXT^  by  calling  the  appropriate  SIMPATOM  functions. 
A  record  is  kept  of  the  Individual  variables  generated  as  labels  for  terms  in  step  I,  so  that 
SIMPATOM  can  put  the  literals  back  together,  replacing  generated  labels  by  the  terms  they 
represent  We  omit  the  details. 


The  following  simplification  algorithm  is  a  refinement  of  the  one  given  above. 


SIMPLIFY(F): 

1.  If  F  is  not  of  the  form  (cond  PA  B),  return  SIMPATOM(F). 

2.  F  is  of  the  form  (cond  P  A  B).  Cali  PUSHO.  Set  Q.«-  ASSERT(P). 

If  Q-  false,  then  POPO  and  return  SIMPLIFY(B). 

•  If  Q,«  true  then  set  A  «-  SIMPLIFY(A),  POP0  and  go  to  step  S. 

Otherwise,  Q,is  a  disjunction  Ej  v  . . .  v  E^. 

Set  A  «-  $IMPLIFY((cond  Ej  A  . . .  (cond  Ek  A  omega) ...)),  POPO  and  go  to  step  S. 

S.  Call  PUSHO-  Set  Q.«-  ASSERT(-P). 

If  Q,-  false,  then  POPO  and  return  A. 

If  Q,m  true  then  set  B  *-  SIMPLIFY(B),  POPO  and  go  to  step  4. 

Otherwise,  Q,is  a  disjunction  Ej  v...vE^. 

Set  B  «-  SIMPLIFY((cond  Ej  B  . . .  (cond  E^  B  omega) ...)),  POPO  and  go  to  step  4. 

4.  If  A  -  omega,  return  B.  If  B  ■  omega,  return  A.  If  A  -  B,  return  A.  Otherwise,  let 
P  -  SIMPATOM(P).  If  A  -  true  and  B  -  false,  return  P.  Otherwise  return  the  expression 
(cond  P  A  B). 


We  sketch  the  proof  of  the  completeness  of  the  algorithm.  Whenever  CONTEXT*,  or 
CONTEXTt  are  non-convex,  SIMPLIFY  calls  Itself  recursively  on  some  cond  expression.  Thus 
whenever  its  argument  is  not  a  cond  expression,  CONTEXT^  and  CONTEXTy  are  convex.  By 
the  definition  of  ASSERT,  CONTEXTg  and  CONTEXTy  entail  the  same  set  of  equalities  when 
ASSERT  returns.  It  follows  from  lemma  5  that  if  CONTEXT$  and  CONTEXTy  are  convex, 
satisfiable,  and  entail  the  same  set  of  equalities,  then  their  conjunction  is  sattsfiable.  Therefore 
whenever  SIMPLIFY  returns  from  step  I,  the  context  is  consistent  If  F  is  valid,  every  leaf  of  its 
cond  tree  with  a  consistent  context  is  labelled  with  true,  so  every  term  returned  in  step  1  is  true.  It 
follows  by  induction  that  A  and  B  are  always  true,  and  therefore  that  the  algorithm  Is  complete. 
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3.4  Comparison  with  DNF-style  Theorem  Proving 


We  do  not  know  how  to  give  an  adequate  analysis  of  our  simplifier.  Its  behaviour  in  practice  is 
much  better  than  its  worst  case  behaviour.  Instead,  we  will  compare  our  approach,  using  cond 
normal  form,  with  an  obvious  alternative  approach,  using  disjunctive  normal  form,  which  we  call  a 
DNF-style  approach.  The  DNF-style  approach  is  not  suited  to  arbitrary  simplification,  but  only  to 
proving  the  validity  of  formulas. 

Let  F  be  a  formula  represented  as  a  cond  tree  with  n  internal  nodes.  The  most  obvious 
algorithm  to  determine  if  F  is  provable  is  to  put  its  negation  into  disjunctive  normal  form  and  test 
each  disjunct  for  unsatisfiability.  This  corresponds  to  testing  that  the  context  of  each  leaf  labelled 
with  false  is  unsatisfiable.  The  standard  DNF-style  approach  builds  up  the  context  for  each  leaf 
from  scratch,  that  is,  from  the  root  of  the  cond  tree.  The  number  of  calls  to  ASSERT  equals  the 
sum,  taken  over  all  leaf  nodes  labelled  with  false,  of  the  length  of  the  branch  to  the  leaf.  This  sum 
varies  from  0<n)  to  0(n2),  and  has  an  average  value  of  Ofa1,5),  if  one  considers  all  binary  trees 
with  n  internal  nodes  and  all  external  node  labellings  with  true  or  false  to  be  equally  likely.  There 
are  no  calls  to  PUSH  or  POP.  A  n on-resettable  satisfiability  program  can  be  used. 

Our  algorithm  makes  n  calls  to  PUSH,  n  calls  to  POP,  and  2n  calls  to  ASSERT.  Therefore, 
DNF-style  algorithms  minimize  (to  zero)  the  number  of  calls  to  PUSH  and  POP,  while  our 
algorithm  minimizes  the  number  of  calls  to  ASSERT.  To  determine  which  method  is  better,  we 
would  need  to  know  the  expected  number  of  calls  to  ASSERT  which  each  algorithm  makes  on 
realistic  input  distributions  and  the  relative  costs  of  resettable  satisfiability  programs  and 
non-resettable  ones. 

The  formulas  which  arise  in  the  Stanford  Verifier  are  often  implications  between 
conjunctions  of  literals.  (Formulas  with  this  structure  arise  in  program  verification  whenever  the 
invariant  assertion  on  a  simple  loop  is  a  conjunction  of  literals.)  If  there  are  n  conjunct*  in  the 
antecedent  of  such  a  formula  and  m  conjunct!  in  the  consequent,  then  the  disjunctive  normal  form 
of  the  negation  of  the  formula  has  length  m(n  +  I),  while  the  cond  tree  has  only  m  +  n  internal 
nodes.  A  DNF-style  algorithm  can  therefore  make  as  many  as  m(n+l)  cads  to  ASSERT,  while  our 
algorithm  can  make  at  most  m  ♦  n  calls  to  ASSERT,  PUSH  and  POP. 


3.5  Finding  the  Simplest  Form 

In  this  section,  we  note  some  problems  with  our  simplifier.  The  problems  do  not  arise  when  our 
simplifier  is  used  as  a  theorem  prover,  but  only  when  it  is  being  used  to  simplify  expressions  which 
do  not  simplify  to  an  atomic  symbol  such  as  true.  These  problems  arise  in  the  design  of  any 
simplification  algorithm. 

First,  a  problem  common  to  all  normal  forms  is  that  they  may  lose  some  of  the  structure  of  the 
original  expression.  It  Is  hard  to  recover  this  structure  if  the  expression  does  not  significantly 
simplify.  For  instance,  using  cond  normal  form,  the  formula  (A  v  B  v  C)  a  (D  v  E  v  F)  is 
"simplified”  to 


(cond  A  (cond  £  true  (cond  D  true  F)) 

(cond  B  (cond  E  true  (cond  D  true  F)) 

(cond  C  (cond  E  true  (cond  D  true  F» 
false))) 


and  (cond  E  true  (cond  D  true  F))  is  duplicated  in  three  places.  Our  simplifier  converts  this 
formula  back  to  a  formula  involving  the  usual  boolean  connectives,  but  the  present  version  of  the 
simplifier  does  not  find  the  original,  simplest  form  of  the  expression. 

Another  problem  occurs  when  simplifying  conjunctions  like  xsyAy2XAX»y.  The 
simplifier  discovers  that  the  last  equality  is  redundant  and  simplifies  the  conjunction  to 
x  s  y  a  y  s  x  instead  of  to  x  -  y.  (Had  the  equality  appeared  first,  both  inequalities  would  have 
been  removed  as  redundant)  Handling  this  problem  requires  extending  the  set  of  primitives  for 
manipulating  contexts.  For  example,  if  a  call  to  ASSERT  made  earlier  conjuncts  in  the  context 
redundant  this  might  be  detected  and  exploited. 

A  significant  problem  concerns  Implementing  the  test  A  -  B  in  step  4  of  our  simplification 
algorithm.  This  is  intended  to  collapse  branches  of  the  cond  tree  which  lead  to  identical  results.  If  A 
or  B  are  atomic  symbols,  there  is  no  problem.  If  they  contain  conds,  testing  for  logical  equivalence  is 
possible  but  probably  impractical.  If  they  contain  no  conds,  then  testing  them  for  equality  (using  the 
lisp  EQUAL)  will  usually  be  sufficient,  if  SIMPATOM  puts  expressions  into  a  canonical  form. 
There  is.  however,  a  difficulty:  consider  (cond  (-  X  1)  (F  I)  (F  X)),  which  we  would  like  to  simplify 
to  (F  X).  Our  SIMPATOM  chooses  (F  IX  not  (F  XX  as  the  canonical  form  when  X  -  1  is  known,  so 
in  step  4  A  is  (F  I)  and  B  is  (F  X).  A  completely  adequate  test  for  collapsing  the  two  branches 
would  require  testing  whether  Qa  P  entailed  A  -  B,  in  which  case  B  should  be  returned,  otherwise 
whether  Q,a  -P  entailed  A  -  B,  in  which  case  A  should  be  returned.  (Qis  the  context  of  F,  which 
is  of  the  form  (cond  P  A  B).)  Again  the  overhead  may  be  prohibitive.  This  problem  actually  arises 
frequently  and  is  more  troublesome  in  practice  than  any  of  the  other  problems  we  have  mentioned 
in  this  section. 


4.  Notes 


The  language  accepted  by  the  simplifier  is  richer  than  that  described  in  section  I.  All  predicates 
(including  -)  and  boolean  connectives  are  considered  boolean-valued  functions  (that  is,  functions 
which  evaluate  to  the  booleans  true  and  false).  Terms  are  allowed  to  contain  arbitrary 
boolean-valued  expressions.  Expressions  are  allowed  as  functions.  The  following  simplifications 
illustrate  this  generality. 


F(true)  a  F(X  v  -  X>, 

true; 

comKtrue,  F,  GXX) 
F(XX 


V.  <7  <7 '!V*r -V*".  <  •  v1  >.•  •.  . . .  .  . 


Our  simplifier  does  not  enforce  strict  typing.  For  instance,  cons(X,  Y)  +  store(V,  I,  E)  is  an 
acceptable  expression  (that  the  simplifier  will  simplify  to  itself).  We  plan  to  add  type  predicates  (or 
type  constants  and  a  type  function)  to  the  next  version  of  our  simplifier. 

The  simplifer  does  not  store  conjunctions  of  atomic  formulas  as  strings  or  LISP  s-expressions, 
but  Instead  in  a  graph  with  one  vertex  for  each  term  and  subterm  in  the  conjunction.  Another  data 
structure  is  used  to  represent  an  equivalence  relation  on  the  vertices.  Two  vertices  are  equivalent  if 
the  terms  they  represent  are  known  to  be  equal  in  this  context  To  propagate  an  equality,  a 
satisfiability  procedure  merges  two  equivalence  classes;  this  can  be  done  very  efficiently.  More  details 
of  this  representation  are  given  in  [Nelson  and  Oppen  19781 

Using  this  representation,  it  is  not  necessary  to  generate  "labels”  for  terms  which  appear  in 
inhomogeneous  literals. 

This  representation  also  allows  the  implementation  of  other  routines  in  our  simplifier  to  be 
more  efficient,  such  as  PUSH  and  POP.  Obviously,  one  way  to  implement  PUSH  would  be  to  have 
it  make  a  physical  copy  of  the  existing  context;  this  is  not  very  satisfactory.  The  approach  we  take  is 
to  keep  a  history  of  all  changes  we  make  to  our  global  data  structure;  popping  then  involves  undoing 
these  changes  until  we  reach  the  context  of  the  last  call  to  PUSH. 

Our  simplifier  is  hot  a  general  purpose  theorem  p rover;  it  cannot  prove  quantified  theorems 
of  the  predicate  calculus.  However,  in  the  Stanford  Verifier,  It  is  used  in  conjunction  with  a 
program  called  the  rul*  handler  which  accepts  user-supplied  lemmas.  During  a  simplification,  the 
rule  handler  instantiates  the  free  variables  of  the  lemmas  and  sends  the  instantiated  lemmas  to  the 
simplifier.  In  our  system,  the  rule  handler  stands  in  the  same  relation  to  the  simplifier  as  the 
satisfiability  programs.  The  rule  handler  am  be  viewed  as  a  satisfiability  program  driven  by 
user-supplied  axioms. 
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